Tokenization and Morphological Analysis for Malagasy

نویسندگان

  • Mary Dalrymple
  • Maria Liakata
  • Lisa Mackie
چکیده

The authors present a tokenizer and nite-state morphological analyzer [Beesley and Karttunen 2003] for Malagasy, based primarily on the discussion of Malagasy morphology in Keenan and Polinsky [1998] and Randriamasimanana [1986]. Words in Malagasy are built from roots by means of a variety of morphological operations such as compounding, afxation and reduplication. The authors analyze productive patterns of nominal and verbal morphology, and describe genitive compounding and sufxation for nouns and various derivational processes involving compounding and afxation for verbs. This work offers a computational analysis of Malagasy morphology, and forms the basis of a computational grammar and lexicon of Malagasy within the framework of the PARGRAM project.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Independent Morphological Analysis

This paper proposes a framework of language independent morphological analysis and mainly concentrate on tokenization, the first process of morphological analysis. Although tokenization is usually not regarded as a difficult task in most segmented languages such as English, there are a number of problems in achieving precise treatment of lexical entries. We first introduce the concept of morpho...

متن کامل

A Two-level Morphology of Malagasy

We present a two-level model of Malagasy nominal and verbal morphology (Beesley and Karttunen, 2003), based primarily on the discussion of Malagasy morphology in Keenan and Polinsky (1998) and Randriamasimanana (1986). Words in Malagasy are built from roots by means of a variety of morphological operations such as affixation and reduplication. The present paper analyzes productive patterns of n...

متن کامل

Consistent and Flexible Integration of Morphological Annotation in the Arabic Treebank

Treebank Annotation Issue: Multiple Levels of Annotation • Annotation not on the source text, but more abstract representation • How to maintain annotation consistency and relation between different levels? • How to make available the multiple levels of representation for the user? Arabic Treebank as a case study: • Mapping between two levels of annotation: • Morphological analysis of source te...

متن کامل

Tokenizing an Arabic Script Language

In any natural language processing project, the input text needs to undergo tokenization before morphological analysis or parsing. For Arabic script languages the tokenization process faces more problems and it plays a more crucial role in natural language processing (NLP) systems for Arabic script languages. In this work we elaborate on some of these problems and present solutions for these. T...

متن کامل

Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop

We present an approach to using a morphological analyzer for tokenizing and morphologically tagging (including partof-speech tagging) Arabic words in one process. We learn classifiers for individual morphological features, as well as ways of using these classifiers to choose among entries from the output of the analyzer. We obtain accuracy rates on all tasks in the

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 11  شماره 

صفحات  -

تاریخ انتشار 2006